Effective Straggler Mitigation: Attack of the Clones
نویسندگان
چکیده
Small jobs, that are typically run for interactive data analyses in datacenters, continue to be plagued by disproportionately long-running tasks called stragglers. In the production clusters at Facebook and Microsoft Bing, even after applying state-of-the-art straggler mitigation techniques, these latency sensitive jobs have stragglers that are on average 8 times slower than the median task in that job. Such stragglers increase the average job duration by 47%. This is because current mitigation techniques all involve an element of waiting and speculation. We instead propose full cloning of small jobs, avoiding waiting and speculation altogether. Cloning of small jobs only marginally increases utilization because workloads show that while the majority of jobs are small, they only consume a small fraction of the resources. The main challenge of cloning is, however, that extra clones can cause contention for intermediate data. We use a technique, delay assignment, which efficiently avoids such contention. Evaluation of our system, Dolly, using production workloads shows that the small jobs speedup by 34% to 46% after state-of-the-art mitigation techniques have been applied, using just 5% extra resources for cloning.
منابع مشابه
Speculation-aware Resource Allocation for Cluster Schedulers
Resource allocation and straggler mitigation (via “speculative” copies) are two key building blocks for analytics frameworks. Today, the two solutions are largely decoupled from each other, losing the opportunities of joint optimization. Resource allocation across jobs assumes that each job runs a fixed set of tasks, ignoring their need to dynamically run speculative copies for stragglers. Cons...
متن کاملSpectrum Sensing Data Falsification Attack in Cognitive Radio Networks: An Analytical Model for Evaluation and Mitigation of Performance Degradation
Cognitive Radio (CR) networks enable dynamic spectrum access and can significantly improve spectral efficiency. Cooperative Spectrum Sensing (CSS) exploits the spatial diversity between CR users to increase sensing accuracy. However, in a realistic scenario, the trustworthy of CSS is vulnerable to Spectrum Sensing Data Falsification (SSDF) attack. In an SSDF attack, some malicious CR users deli...
متن کاملMulti-Task Learning for Straggler Avoiding Predictive Job Scheduling
Parallel processing frameworks (Dean and Ghemawat, 2004) accelerate jobs by breaking them into tasks that execute in parallel. However, slow running or straggler tasks can run up to 8 times slower than the median task on a production cluster (Ananthanarayanan et al., 2013), leading to delayed job completion and inefficient use of resources. Existing straggler mitigation techniques wait to detec...
متن کاملInterference Mitigation of Replay Attacks in GPS Receiver using of Finite Impulse Response Filter
The vulnerability of civil GPS receiver to interference may be intentional or unintentional. Among all types of interference, replay attack intended as the most dangerous intentional one. The signal structure of replay attack is almost the same with the satellite signal. The interference effects can be reduce with the design of an appropriate filter in the receiver. This paper presents two meth...
متن کاملStraggler Mitigation in Distributed Matrix Multiplication: Fundamental Limits and Optimal Coding
We consider the problem of massive matrix multiplication, which underlies many data analytic applications, in a large-scale distributed system comprising a group of worker nodes. We target the stragglers’ delay performance bottleneck, which is due to the unpredictable latency in waiting for slowest nodes (or stragglers) to finish their tasks. We propose a novel coding strategy, named entangled ...
متن کامل